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^ ! Abstract 



In this paper, we adopt a cross layer design approach for analyzing the throughput-delay 



C , tradeoff of the multicast channel in a single cell system. To illustrate the main ideas, we 

^ I start with the single group case, i.e., pure multicast, where a common information stream is 

' requested by all the users. We consider three classes of scheduling algorithms with progressively 

increasing complexity. The first class strives for minimum complexity by resorting to a static 
scheduling strategy along with memoryless decoding. Our analysis for this class of scheduling 
algorithms reveals the existence of a static scheduling policy that achieves the optimal scaling 
^ ■ law of the throughput at the expense of a delay that increases exponentially with the number 

of users. The second scheduling policy resorts to a higher complexity incremental redundancy 
encoding/decoding strategy to achieve a superior throughput-delay tradeoff. The third, and 
^ ' most complex, scheduling strategy benefits from the cooperation between the different users to 

. minimize the delay while achieving the optimal scaling law of the throughput. In particular, 

^ I the proposed cooperative multicast strategy is shown to simultaneously achieve the optimal 

\^ • scaling laws of both throughput and delay. Then, we generalize our scheduling algorithms to 

. exploit the multi-group diversity available when different information streams are requested 

I by different subsets of the user population. Finally, we discuss the effect of the potential gains 

' of equipping the base station with multi-transmit antennas and present simulation results that 

O ' validate our theoretical claims. 

> 

^ : 1 Introduction 

P.. 

Traditional information theoretic investigations pay little, if any, attention to the notion of delay. 
Clearly, this approach is not adequate for many applications, especially those with strict Quality 
of Service (QoS) constraints. To avoid this shortcoming, recent years have witnessed a growing 
interest in cross layer design approaches. The underlying idea in these approaches is to jointly 
optimize the physical, data link, and networking layers in order to satisfy the QoS constraints with 
the minimum expenditure of network resources. Early investigations on cross layer design have 
focused on the single user case [1,2]. These works have shed light on the fundamental tradeoffs in 
this scenario and devised efficient power and rate control policies that approach these limits. More 
recent works have considered multi-user cellular networks [3-6]. These studies have enhanced our 
understanding of the fundamental limits and the structure of optimal resource allocation strategies. 
Here, we take a first step towards generalizing this cross layer approach to the wireless multicast 
scenario. This scenario is characterized by a strong interaction between the network, medium access. 
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and physical layers. This interaction adds significant complexity to the problem which motivated 
the adoption of a simplified on-off model for the wireless channel in several of the recent works 
on wireless multicast [7-9]. In the sequel, we argue that employing more accurate models for 
the wireless channel allows for valuable opportunities for exploiting the wireless medium to yield 
performance gains. More specifically, our work sheds light on the role of the following characteristics 
of the wireless channel in the design of multicast scheduling strategies: 1) The multi-user diversity 
resulting from the statistically independent channels seen by the different users [10], 2) The wireless 
multicast gain resulting from the fact that any information transmitted over the wireless channel 
is overheard by all users, possibly with different attenuation factors, and 3) The cooperative gain 
resulting from antenna sharing between users [11]. 

To illustrate the main ideas, we first focus on the single group (pure multicast) scenario where 
the same information stream is transmitted to all users in the network [12]. We consider three 
classes of scheduling algorithms with progressively increasing complexity. The first class strives for 
minimum complexity by resorting to a static scheduling strategy along with memoryless decoding^. 
In this approach, we schedule transmission to a fraction of the users that enjoy favorable channel 
conditions. While the identity of the target users change, based on the channel conditions, the 
static nature of the algorithm is manifested in the fact that a fixed fraction of the users is able to 
decode every transmitted packet. We establish the throughput-delay tradeoff allowed by varying the 
fraction of users targeted in every transmission. To gain more insight into the problem, we study in 
more detail the three special cases of scheduling transmissions to the best, worst and median user^. 
Here we establish the asymptotic throughput optimality of the median user scheduler and show 
that the price for this optimality is an exponential growth in delay with the number of users. The 
second scheduling policy resorts to a higher complexity incremental redundancy encoding/decoding 
strategy to achieve a better throughput-delay tradeoff. This scheme is based on a hybrid Automatic 
Repeat reQuest (ARQ) strategy and is shown to yield a significant reduction in the delay, compared 
with the median user scheduler, at the expense of a minimal penalty in the throughput. The third, 
and most complex, scheduling strategy benefits from the cooperation between the different users to 
minimize the delay while achieving the optimal scaling law of the throughput. More specifically, we 
show that the proposed cooperative multicast strategy simultaneously achieves the optimal scaling 
laws of both throughput and delay at the expense of a high complexity. Finally, we extend our 
study to the multi-group scenario where independent streams of information arc transmitted to 
different groups of users. Here, we generalize our scheduling algorithms to exploit the multi-group 
diversity available in such scenarios. 

The rest of the paper is organized as follows. In Section 2, we introduce the system model 
along with our notation. In Section 3, we propose the three classes of scheduling algorithms for 
the pure multicast scenario and characterize the achieved throughput-delay tradeoffs. We then 
extend our schemes to exploit the multi-group diversity in Section 4. The potential performance 
gains allowed by multi-transmit antenna base stations are quantified in Section 5. In Section 6, 
we present numerical results that validate our theoretical claims in certain representative scenarios. 
Finally, some concluding remarks are offered in Section 7. In order to enhance the flow of the paper, 
we collect all the proofs in the Appendices. 

^Memoryless decoding refers to the fact that the decoder memory is flushed in case of decoding failure. 
^These notions will be defined rigorously in the sequel. 
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2 System Model 



We consider the downlink of a single cell system where a base station serves G groups of users. The 
information streams requested by the different groups from the base station are independent of each 
other. Each group consists of users. All the users within a group request the same information 
from the base station. Unless otherwise stated, the base station is assumed to be equipped with a 
single transmit antenna. Each user is assumed to have only a single receive antenna. We consider 
time-slotted transmission in which the signal received by user i in time slot k is given by 

yi[k] = hix[k] + riilk], 

where x[k] denotes the complex- valued signal transmitted by the base station in slot k, hi represents 
the complex flat fading coefficient of the channel between the base station and the i*'* user, and 
ni[k] represents the zero-mean unit- variance complex additive white Gaussian noise at the i^^ user 
in slot k. The noise processes are assumed to be circularly symmetric and independent across users. 
The channel between the base station and each user is assumed to be quasi-static with coherence 
time Tf.. Thus the fading coefficients remain constant throughout an interval of length and 
change independently from one interval to the next. The fading coefficients {hi} are assumed to be 
independent and identically distributed (i.i.d.) across the users and follow a Rayleigh distribution 
with E[\hi\^] = 1. In this paper, we restrict our attention to this symmetric scenario, and hence, 
issues related to fairness are outside the scope of this work. Each packet transmitted by the base 
station is assumed to be of constant size S. We further employ the following short term average 
power constraint 



E 



\x[k] 



< P. 



Clearly, further performance gain may be reaped through a carefully constructed power alloca- 
tion policy if this short term power constraint is replaced by a long term one. This line of work, 
however, is not pursued here and we only rely on rate adaptation and scheduling based on the 
instantaneous channel state. The scheduling schemes proposed in the sequel require one further as- 
sumption. We require all the channel gains to be available at the base station. Hence the proposed 
scheduling strategies, except the incremental redundancy scheme^, assume perfect knowledge of the 
channel state information (CSI) at both the transmitter and receiver. In our throughput analysis, 
we use capacity expressions for the channel transmission rates. Here we implicitly assume that the 
base station employs coding schemes that approach the channel capacity which justify our use of 
the fundamental information theoretic limit of the channel. 

In our delay analysis, we consider backlogged queues, and hence, the only meaningful measure 
of delay is the transmission delay. This leads to the following definitions for throughput and delay 
that will be adopted in the sequel. 

Definition 1 The throughput of a scheduling scheme is defined as the sum of the throughputs 
provided by the base station to all the individual users within all the groups in the system. 



Definition 2 The delay of a scheduling scheme is defined as the delay between the instant repre- 
senting the start of transmission of a packet belonging to a particular group of users, and the instant 
when the packet is successfully decoded by all the users in that group. 

^For the incremental redundancy scheme, the base station only needs to know when to stop transmission of the 
current codeword. 
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A brief comment on the notion of delay adopted in our work is now in order. This definition 
suffers from the fundamental weakness that it does not account for the queuing delay experienced 
by the packets. Unfortunately, at the moment we do not have an analytical characterization of the 
queuing delay for the general case. However, as argued in the sequel, our delay analysis offers a lower 
bound on the total delay which is very tight in several important special cases. Furthermore, this 
analysis provides a very useful tool for rank-ordering the different classes of scheduling algorithms 
and sheds light on their structural properties. 

To facilitate analytical tractability, we focus on evaluating the asymptotic scaling laws of the 
throughput and delay in the sequel. In this analysis, we use the following set of Knuth's asymptotic 
notations throughout the paper: l)f{n) = 0{g{n)) iff there are constants c and no such that f{n) < 
cg{n) Vn > no, 2) /(n) = Q{g{n)) iff there arc constants c and no such that /(n) > cg{n) "^n > Uq, 
and 3) /(n) = Q{g{n)) iff there are constants Ci, C2 and no such that Cig{n) < f{n) < C2g{n) 
\/n > no- Furthermore, the two following technical assumptions are imposed. 

1. We let 

^« = « (i^) ■ « 

This technical assumption is made to ensure (as shown in the sequel) that the average service 
time required for transmitting a packet is not dominated by the scaling behavior of Tc- 

2. In our delay analysis, we make an exponential server assumption, i.e., the rate of service R 
offered by the server in any time slot is assumed to follow an exponential distribution with the 
same mean as that obtained from our problem formulation. Thus, for a particular scheduling 
algorithm, the service rate distribution is given by 

Fn{r) = 1 - e-'^^ r > 0, (2) 

where /i — {1/E[R]) depends on the channel characteristics and the scheduling algorithm. 

3 Single Group (Pure Multicast) Scenario 

In this section, we consider the pure multicast scenario where the same information stream is trans- 
mitted to all users in the network. In the non-cooperative scenario, the throughput-optimal scheme 
is an A^-level superposition coding/successive decoding scheme [13]. This strategy, however, suffers 
from excessive complexity and the corresponding delay analysis seems intractable at the moment. 
This motivates our work where we focus on the throughput-delay tradeoff of low complexity schedul- 
ing schemes. Interestingly, we identify a low complexity static scheduling scheme, as defined in the 
next section, that achieves the optimal scaling law of the throughput. Furthermore, we establish 
the optimality of the proposed cooperative multicast scheme in terms of the scaling laws of both 
delay and throughput. 

3.1 Static Scheduling With Memoryless Decoding 

In this class of scheduling algorithms, referred to as static schedulers in the sequel, we schedule 
transmission to a fixed fraction of the users with favorable channel conditions. The transmission 
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rate is adjusted such that each transmission by the base station is intended for successful reception by 
{N/a) users in the system. Hence at any time instant, the base station transmits to the user whose 
instantaneous SNR occupies the {N—(N/ a) + 1)*'* position in the ordered hst of instantaneous SNRs 
of all users. The other {(N/a) — 1) users with higher channel gains can also decode the transmitted 
information. The parameter a of the scheme is restricted to be a factor of and satisfies a G 
and 1 < q; < A^. This scheme is "static" in the sense that the fraction of users targeted in every 
transmission remains the same (i.e., the parameter a is not a function of time). When a > 1, some 
of the users will not be able to decode. The memoryless property dictates that those users flush their 
memories and wait for future re-transmissions of the packet. This assumption is imposed to limit 
the complexity of the encoding/ decoding process. In Section 3.2, we relax this memoryless decoding 
assumption and quantify the gains offered by carefully constructed ARQ schemes. As shown later, 
this class of static scheduling algorithms exploit both the multi-user diversity and multicast gains, 
to varying degrees, depending on the parameter a. 

The average throughput of this general static scheduling scheme is given by 

Rtot — (^~^ E[Ra], 

where Ra is the transmission rate to each of the intended {N/a) users and is given by 

i?e« = l0g(l + |/l,(^_^ + l)pP), (3) 

where |/i^(^_iv^-^-) p is the channel power gain of the user whose SNR occupies the (A^ — {N/a) + 1)*^* 
position in the ordered list of SNRs of all users. Throughout the paper, the log(.) function refers 
to the natural logarithm, and hence, the average throughput is expressed in nats. 

A critical step in the delay analysis is to identify the queuing model. In our model, the base 
station maintains (jv^^,) queues, one for each combination of {N/a) users. These queues can be 
divided into sets with a coupled queues in each set such that the combinations of users served by 
the a queues within a set are mutually exclusive (to ensure that multiple copies of the same packet 
are not sent to any of the users) and collectively exhaustive (to ensure that the packet reaches all 
the users), i.e., every user in the system is served by exactly one of the a queues in each set. For 
example, with N — 6 users and o; = 3, we have 15 queues divided into 5 sets with three queues in 
each set (One possible set of coupled queues serve users {(1,2), (3,4), (5,6)} and another possible 
set may serve users {(1,4), (2,5), (3,6)}. Note that each user occurs once and only once in each 
set). Hence, any packet that arrives at the base station is routed towards one of the sets'' where it 
is stored in all the a queues within that set (since it needs to be transmitted to all the users in the 
system). Thus the delay in transmitting a particular packet to all the users is given by the delay 
in transmitting that packet from each of the a coupled queues in the corresponding set. Moreover, 
the base station services only one of the queues at any time, which is chosen based on the 

instantaneous fading coefficients of all the users. An example of the queuing model for a system 
with N — 6 users and ck = 3 is shown in Fig. 1. 

In our analysis, we benefit from the concept of worst case delay proposed in [14] for analyzing 
the delay in unicast networks. In this work, the authors characterized the worst case delay by 
restating their problem as the "coupon collector problem" which has been studied extensively in 

''Here, we use a probabilistic approach for choosing the set with a uniform distribution. 
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the mathematics hterature [15-17]. In the coupon collector problem, the users are assumed to have 
coupons and the transmitter is the collector that selects one of the users randomly (with uniform 
distribution) and collects his coupon. The problem is to characterize the average number of trials 
required to ensure that the collector collects m coupons from all the users. Our queuing problem 
is analogous to the coupon collector problem with the only fundamental difference being that the 
size of the coupons is time-varying in our problem due to rate adaptation (the detailed analysis is 
presented in the proofs). Now, we are ready to state our result that characterizes the scaling laws 
of throughput and delay for the different static scheduling algorithms. 

Theorem 3 The average throughput Rtot of the general static scheduling scheme is given by 



Rtot — 



N 
a 






(4) 



where 

Ei{x) = / - 

J —oo t 

The average delay D of this scheme satisfies 

N \ log a 



dt. 



D = max < Q 




N/aJ log log 




(5) 



where Xrnm = iiiiiiiLi o.iT'd the 's are defined as the service times required for transmitting a 
packet from the i^^ queue of a set of a queues assuming that the server always services the i*'* queue. 

To gain more insights into the rather involved throughput and delay expressions of Theorem 3, 
we study three special cases of the general static scheduling scheme in more detail. This detailed 
analysis sheds light on the throughput-delay tradeoff achievable by varying a. We further establish 
the optimality of the scheduler corresponding to o; = 2 with respect to the throughput scaling law. 



3.1.1 Worst User Scheduler 

The worst user scheme corresponds to the case a = 1 of the general scheduling scheme. This 
scheme maximally exploits the multicast gain by always transmitting to the user with the least 
instantaneous SNR. This enables all the users to successfully decode the transmission and thus any 
particular packet reaches all the users in a single transmission. However, the multi-user diversity 
inherent in the system works against the performance of this scheme and results in a decrease in 
the individual throughput to any user. 

The average throughput of the worst user scheme is given by 



R 



■tot 



NE 



log (l + |/i,(i)|2p) 



where p is the minimum channel gain among all the N users in the system, whose distribution 
and density functions are given by 



l'*7r(l)l 



-Nx 



-Nx 



X > 0. 



For implementing this scheme, the base station needs to maintain only a single queue that caters 
to all the users in the system. 
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Lemma 4 The average throughput of the worst user scheme scales as 



Rtot = ©(1) (6) 
with the number of users N. The average delay of this scheme scales as 

D^e(N). (7) 

Thus the average throughput of the worst user scheme does not scale with the number of users 
N while the average delay increases linearly with N. 

3.1.2 Best User Scheduler 

This scheme corresponds to the case a = of the general scheduling scheme and maximally exploits 
the multi-user diversity available in the system. Since the transmission rate is adjusted based on 
the user with the maximum instantaneous SNR, this scheme fails to exploit any of the multicast 
gain and any particular packet must be repeated N times. The average throughput of the best user 
scheme is given by 

Rtot = E 



\og[l + \h^(N)\ P 

where |/i7r(Af) P is the maximum channel gain among all the N users in the system, whose distribution 
function is given by 

In this special case, the base station maintains N queues, one for each user in the system, and any 
packet that arrives into the system enters all the N queues. The following result establishes the 
throughput and delay scaling laws achieved by the best user scheduler. 

Lemma 5 The average throughput of the best user scheme scales as 

Rtot ^QiloglogN) (8) 

with the number of users N. The average delay of this scheme scales as 

\\oglogNj ^ ' 

From Lemmas 4 and 5, one can conclude that maximally exploiting the multi-user diversity 
yields higher throughput gains than maximally exploiting the multicast gain. This throughput gain, 
however, is obtained at the expense of a higher delay. This observation motivates the investigation of 
other variants of the static scheduling strategy which achieve other points on the throughput-delay 
tradeoff. 
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3.1.3 Median User Scheduler 



The median user scheduler corresponds to the case a = 2 of the general scheduling scheme. In 
this scheme, the base station maintains (j^^^ queues, one for each combination of {N/2) users. 
This scheme strikes a balance between exploiting multi-user diversity and multicast gain. The base 
station always transmits to the user whose instantaneous SNR occupies the median position of 
the ordered list of SNRs. Each transmission is, therefore, successfully decoded by half the users 
in the system and the same information needs to be repeated only twice before it reaches all the 
users. Thus, unlike the best user scheduler, this scheduler benefits from the wireless multicast gain. 
Moreover, unlike the worst user scheduler, the inherent multi-user diversity does not degrade the 
performance of this scheduler (since the instantaneous SNR of the median user is not expected to 
degrade with N). In fact, we show in the following that this scheme achieves the optimal scahng 
law of the throughput as the number of users N grows to infinity. 

Lemma 6 The proposed median user scheme achieves the optimal scaling law of the throughput. 
The average throughput of this scheme scales as 

Rtot = e(7V) (10) 

with the number of users N. The average delay of this scheme scales as 




Thus the throughput optimality of the median user scheduler is obtained at the expense of an 
exponentially increasing delay with the number of users N. Overall, these three special cases of 
the static scheduling strategy show that one can achieve different points on the throughput-delay 
tradeoff by varying a. 

3.2 Incremental Redundancy Multicast 

In this section, we relax the memoryless decoding requirement and propose a scheme that em- 
ploys a higher complexity incremental redundancy encoding/ decoding strategy to achieve a better 
throughput-delay tradeoff than the static scheduling schemes. The proposed scheme is an extension 
of the incremental redundancy scheme given by Caire et al in [18]. An information sequence of b 
bits is encoded into a codeword of length LM, where M refers to the rate constraint. The first L 
bits of the codeword are transmitted in the first attempt. If a user is unable to successfully decode 
the transmission, it sends back an ARQ request to the base station. If the base station receives 
an ARQ request from any of the users, it transmits the next L bits of the same codeword in the 
next attempt. This process continues until either all users successfully decode the information 
or the rate constraint M is violated. Then the codeword corresponding to the next b information 
bits is transmitted in the same fashion. In this scheme, even if some of the users successfully de- 
code the information in very few attempts, they still have to wait until all the N users successfully 
receive the information before any new information is transmitted to them by the base station. 
This sub-optimality of the proposed scheme results in significant complexity reduction by avoiding 
the use of superposition coding and successive decoding. Moreover, this scheme does not require 
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the knowledge of perfect CSI at the base station. The base station only needs to know when to 
stop transmission of the current codeword. Hence the feedback required is minimal. The following 
result establishes the superior throughput-delay tradeoff achieved by this scheme, compared with 
the class of static schedulers with memoryless decoding. 

Theorem 7 The average throughput of the incremental redundancy scheme scales as 



Thus, we can see that incremental redundancy multicast avoids the exponentially growing delay 
of the median user scheduler at the expense of a minimal penalty in throughput. In fact, the 
loss in both delay and throughput scaling laws, compared to the optimal values, is only a factor 
of log(A'")/loglog(A^). In this approach, the base station needs to maintain only a single queue 
that serves all the users in the system. This approach, however, entails added complexity in the 
incremental redundancy encoding and the storage and joint decoding of all the observations. 

3.3 Cooperative Multicast 

In this section, we demonstrate the benefits of user cooperation and quantify the tremendous gains 
that can be achieved by allowing the users to cooperate with each other. In particular, we propose 
a cooperation scheme that minimizes the delay while achieving the optimal scaling law of the 
throughput. This scheme is divided into two stages. In the first half of each time slot, the base 
station transmits the packet to one half of the users in the system (i.e., the median user scheduler). 
During the next half of the slot, the base station remains silent. Meanwhile all the users that 
successfully decoded the packet in the first half of the slot cooperate with each other and transmit 
the packet to the other {N/2) users in the system. This is equivalent to a transmission from a 
transmitter equipped with {N/2) transmit antennas to the worst user in a group of {N/2) users. 
If Rsi and Rs2 arc the rates supported in the first and second stage respectively, then the actual 
transmission rate is chosen to be Tam{Rsi, Rs2} in both stages of the cooperation scheme. Note 
that the rate Rs2 is chosen such that the information can be successfully decoded even by the worst 
of the remaining {N/2) users. Here, we note that this scheme requires the base station to know 
the CSI of the inter-user channels. The scheme, however, does not require the users to have such 
transmitter CSI (i.e., in the second stage the users cooperate blindly by using i.i.d. random coding). 
The average throughput of the proposed cooperation scheme is thus given by 



The following result establishes the optimality of the proposed scheme, in terms of both delay 
and throughput scaling laws. 




(12) 



with the number of users N. The average delay D of this scheme scales as 




(13) 
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Theorem 8 The proposed cooperation scheme achieves the optimal scaling laws of both delay and 
throughput. In particular, the average throughput of this scheme scales as 




(14) 



(15) 



Here we assume that the inter-user channels have the same fading statistics as the channels between 
the base station and users, and the total transmitted power is upper bounded by P. 

The price for this optimal performance is the added complexity needed to 1) equip every user 
terminal with a transmitter, 2) decode/re-encode the information at each cooperating user terminal, 
and 3) inform the base station with perfect CSI of the inter-user channels. 



In this section, we generahze the scheduhng schemes proposed in Section 3 to the muhi-group sce- 
nario where different information streams are requested by different subsets of the user population. 
We modify the proposed schemes to exploit the multi-group diversity available in this scenario by 
always transmitting to the best group. We characterize the asymptotic scaling laws of the through- 
put and delay of the static schedulers with the number of users per group N and the number of 
groups G in the following theorem. 

Theorem 9 1. The average throughput of the best among worst users scheme scales as 



4 Multi-group Diversity 



■tot — 



e(iogG) 



(16) 



with N and G. The average delay of this scheme scales as 




(17) 



2. The average throughput of the best among best users scheme scales as 

Rtot = e{log\og NG) 




with N and G. The average delay of this scheme scales as 




(19) 



3. The average throughput of the best among median users scheme satisfies 



n{N) = R, 



■tot — 



0{N loglogG), 



(20) 



while the average delay of this scheme satisfies 
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In the multi-group incremental redundancy scheme, the information bits corresponding to each 
of the groups are encoded independently. During each time slot, the base station selects that group 
for which it can send the highest total instantaneous rate to the users who failed to decode up to 
this point. This selection process makes the scheme "dynamic" in the sense that the outcome of 
the scheduling process at any particular time slot depends on the outcomes in all previous slots. 
Unfortunately, this dynamic nature of the proposed scheme adds significant complexity to the 
problem and, at the moment, we do not have an analytical characterization of the corresponding 
scaling laws. 

In the multi-group cooperation scheme, during each time slot, the base station selects the best 
group g for transmission according to the condition 

g = arg^max^ | (^y^ min{i?fi, i^fs}} ■ (22) 

Theorem 10 The average throughput of the proposed multi-group cooperation scheme satisfies 

n{N) =Rtot = 0{N loglogG), (23) 
while the average delay of this scheme satisfies 

As expected, the throughput gain resulting from the multi-group diversity entails a correspond- 
ing price in the increased delay. 



5 Multi- Transmit Antenna Gain 

The performance of the proposed static scheduling schemes depends on the spread of the fading 
distribution. For exploiting significant multi-user diversity gains, the distribution needs to be well- 
spread out. The lower the spread of the distribution, the lesser the multi-user diversity gain (or loss 
as shown in the following). To illustrate this point, we consider a scenario where the base station 
is equipped with L transmit antennas. We assume that the base station has knowledge of only the 
total effective SNR at any particular user and does not know the individual channel gains from 
each transmit antenna to that user. Under this assumption, the base station just distributes the 
available power equally among all the L transmit antennas. Thus the effective fading power gains 
follow a normalized Chi-square distribution with 2L degrees of freedom. Note that the fading power 
gains are exponentially distributed (Chi-square with 2 degrees of freedom) in the single transmit 
antenna case. We now characterize the asymptotic scaling laws of the throughput of the proposed 
static schedulers for this multi-transmit antenna scenario. Note that all the results in this section 
are derived for the case where L is a constant and does not scale with A^. 



5.1 Worst User Scheduler 

For the worst user scheme, the average throughput is given by 

Rtot^NE [log (l + \XminfP)' 
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where |XmmP = min^i IXip! ^^(^ IXiP corresponds to the effective fading power gain at the i*'* user 
that follows a normalized Chi-square distribution with 2L degrees of freedom and whose distribution 
function is given by 

^^^^ = ^ ^) ' ^^^^ 

Lemma 11 When the base station is equipped with L transmit antennas, the average throughput 
of the worst user scheme scales as 

Rtot = G (7V(^)^ . (26) 

Thus the average throughput increases with L. This is expected since the performance of the 
worst user scheduler is degraded by the tail of the fading distribution. Hence, as L increases, the 
spread of the fading distribution decreases, and consequently, the inherent multi-user diversity has 
a reduced effect on the performance of the scheduler. This leads to a rise in the average throughput 
of the worst user scheme from 0(1) for the single transmit antenna case to Q{N) for large values 
of L. 

5.2 Best User Scheduler 

For the best user scheme, the average throughput is given by 



where \xmax\ = maXj^^ \xi\ ■ 



Rtot — E 

N 



log {l+\Xmax?P) 



Lemma 12 When the base station is equipped with L transmit antennas, the average throughput 
of the best user scheme scales as 

«..^e(log(l + '°«^^(^-/)'°«'°«^ )). (27) 

Since the best user scheduler leverages multi-user diversity to enhance the throughput, one can 
see that the throughput of the best user scheme decreases as L increases. 



6 Numerical Results 

Here we present simulation results that validate our theoretical claims. These results were obtained 
through Monte-Carlo simulations and were averaged over at least 5000 iterations. The power 
constraint P is taken to be unity. The throughput of the static schedulers, proposed in Section 3.1, 
is shown in Fig. 2 for different positions of the intended user in the ordered list of SNRs of all 
users. It is evident from the figure that, as predicted by the analysis, the throughput of the median 
user scheme is better than that of the best user scheme, which in turn outperforms the worst user 
scheme. In Fig. 3, we present a throughput-comparison for all the schemes proposed in Section 3 for 
increasing values of N . The corresponding delay-comparison is presented in Fig. 4. The throughput- 
comparison for the different scheduling schemes in the multi-group scenario is presented in Fig. 5 
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with G = 5 groups (the corresponding delay-comparison is presented in Fig. 6). Although the best 
among worst users scheduler performs better than the best among best users scheme, in terms of 
throughput, for the range of N values shown in the plot, it should be noted that the latter eventually 
outperforms the former for large values oi N {N > 600). Except for this case, in all other considered 
scenarios, we can see that the simulation results follow the same trends predicted by our asymptotic 
analysis. Finally, we observe that the utility of our asymptotic analysis is manifested in its accurate 
predictions even with the relatively small number of users used in our simulations (i.e., in the order 
of = 10). 



7 Conclusions 



In this paper, wc have used a cross layer design approach to shed more light on the throughput- 
delay tradeoff in the cellular multicast channel. Towards this end, we proposed three classes of 
scheduling algorithms with progressively increasing complexity, and analyzed the throughput-delay 
tradeoff achieved by each class. We first considered the class of low-complexity static scheduling 
schemes with memoryless decoding. We showed that a special case of this scheduhng strategy, 
i.e., the median user scheduler, achieves the optimal scaling law of the throughput at the expense 
of an exponentially increasing delay with the number of users. We then proposed an incremental 
redundancy multicast scheme that achieves a superior throughput-delay tradeoff, at the expense of 
increased encoding/decoding complexity. We further proposed a cooperation scheme that achieves 
the optimal scaling laws of both throughput and delay at the expense of a high RF and computa- 
tional complexity. We then generalized our schemes to the multi-group scenario and characterized 
their ability to exploit the multi-group diversity offered by the wireless channel. Finally, we pre- 
sented simulation results that establish the accuracy of the predictions of our asymptotic analysis 
in systems with low to moderate number of users. 



A Proof of Theorem 3 

The channel gain distribution function F{x) given by 

(1 - e-^)'=e-^^-'=)^ x>0. 

Hence the average throughput of the proposed scheme is given by 

N r°° 

Rtot^— \og{l + xP)dF{x). 
a Jo 

Integrating by parts and simplifying, we obtain the average throughput as stated in equation (4) of 
the theorem. 

We now calculate the average delay of the proposed scheme. We consider each coherence interval 
of length Tc as a time slot. We first calculate the probabihty distribution of the service time X 
required for transmitting a packet (of size S) when the base station always services the same queue. 
The service time X is defined as 

X^kT,, A;e{l,2,...}, (28) 
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where k is such that 

/k-l \ / k 



<S<TjJ2K]- (29) 



\i=l / \i=l 



Here i?^ represents the service rate in the i time slot as given in (3). The probabihty distribution 
of X is given by 

Pr(X = kn) = Pr fxi^a < ^ < ■ 

\i=l -'c i=i ) 

We let C — {S/T^ in the sequel. Using the exponential server assumption in (2) for the service 
rates {-R^}, we have (for A; > 1) 

P.(x = m = [ (v)dy') Print >c-v) 

=^Fr{X = kn)= ^^^^^j, , ke{l,2,...}. (30) 
Now the average service time X is given by 

X = (1 + /xC)Tc = Te + 
Since G = 1 for the single group scenario, the assumption in (1) reduces to 

From the results in [19], we know that 

E[Ra]<E[Rj,]^e{\og\ogN). 

Hence 

^ E[R^] Viogiog^y " " 

Thus for all possible values of the parameter a, we have 

X = Tc + iiS = Q{iiS) = Q ^ 



Hence it is clear that the assumption on Tc in (1) ensures that the average service time X is not 
dominated by the scahng behaviour of Tc. 

We now focus on one set of a coupled queues. Any packet that arrives into this set enters all 
the a queues within the set and moreover, the base station services only one of the (^q,) available 
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queues in any time slot. Note that X was calculated assuming that the base station always services 
the same queue. We are interested in determining the delay involved in successfully transmitting a 
particular packet from all the a coupled queues in the set. The actual delay, as defined in Section 2, 
is the time between the start of transmission of a packet and the instant when the packet reaches 
all the users in the system. In our analysis, we assume that the packet of interest is at the head 
of all the a queues in the set during the start of transmission. This assumption thus yields a lower 
bound on the actual delay. 

We characterize the delay based on the observation that our queuing problem is equivalent to 
the well-known "coupon collector" problem. This observation was made earlier in [14] where the 
authors characterized the delay of the throughput-optimal broadcast scheme. They assumed that 
the server (base station) offers a constant service rate which is independent of the instantaneous 
channel gains. In our analysis, however, we have incorporated the effects of rate adaptation. Let 
Xi,X2, - ■ ■ ,Xa denote the service times (assuming continuous service), with distribution as given 
in (30), required for transmitting a packet from each of the a queues in the set. Then the delay of 
the proposed scheduling scheme is directly proportional to the minimum number of trials required 
to ensure that the first queue is served at least (Xi/Tc) times by the base station, the second queue 
is served at least (X2/Tc) times and so on ... 

We lower bound the average delay by calculating the minimum number of trials required 
to ensure that all the a queues are served at least {Xmin/Tc) times by the base station, where 
Xmin = min{Xi, X2, ■ ■ ■ , Wc determine the average number of such required trials E[Nt\Xjnin] 
using the results derived in [14] . Since the base station services only one of the queues in any 

time slot and the users are symmetric, there is an equal probability that the base station services 
any one of the queues. Since we need to consider only one set of a coupled queues for determining 
the delay, we consider all the other queues in the system jointly as one "dummy" queue called the 
{a + 1)*'* queue. Now the probabihties {pj} of the server choosing the j*'* queue is given by 

1 ^ a 

Pi = • • • = Pa = TWX Pa+l = Pe = 1 - 



f N \ """^ ^°+^ — - e — - f N \ - 
\N/aJ \N/aJ 

These probabilities {pj} remain constant through all time slots and arc not functions of the instan- 
taneous service rates {-R^} provided by the base station. The Moment Generating Function (MGF) 
of the number of trials required is given by [14] 

00 00 

1=0 1=0 

where 6j is the probability of failure of sending a packet to all the users in i channel uses. The value 
of bi is equal to the polynomial 

^ TW\ + PeXa+l 



.\N/a) \N/a) 



evaluated &X. xi = ■ ■ • — Xa+i — 1 after removing the terms that have all exponents of xi, ■ ■ ■ , Xq, 
greater than or equal to {Xmin/Tc) (denoted by the operator {.}). Thus the MGF of the number 
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of trials required is given by 



[{N/a 



H \-Xa + Pe 



' N ' 



X. 



a+l 



evaluated at a;i 



Xa+i = 1- But we know that 



{n/o} 



L 



e —fdt. 



Using this identity and simplifying, we get 



Nt\Xr, 



N 
N/a 



L 



)(jV/a)* 



1 — il — S^X^i^^ 



{t)e- 



dt, 



where 



m— 1 



i=0 



Hence the average number of trials required E[Nt\Xjnin\ is given by [14] 



E[Nt\X, 



m.in\ — PNt\Xjnin{^] 



' N 



al Jo 



{t)e- 



dt 



' N 



\E 



max Yi 

l<i<a 



where the Fj's are i.i.d random variables that follow a Chi-square distribution with {2Xmin/Tc) 
degrees of freedom. Prom the results in [14], it can be seen that for such a sequence of random 
variables {Yi}, 

max < 0(log a) , © 



E 



max Yi 

l<i<a 



(31) 



Using this result, the average number of trials required is given by 
E[Nt\Xrain] = max i 6 



Thus the average delay of the general static scheduhng scheme can be lower bounded by 



D > Ex^,^ [E[Nt\Xmin]Tc] = Ex„ 



max < 9 



TJoga ,6 



Since E [ma.x{Zi, Z2}] > ma.x{E[Zi], E[Z2]}, we have 

' ' N ' 



D — max < r2 



D = max < fl 



N 



N 



logo; 



^N/a ; log log TV " - - " - ' - ' - -"""^ ' ' ■ ^^^^ 

Moreover, when E[X^in\ — 6 (^X^ , it can be easily seen that the expression on the right in (32) 
gives the exact scahng of the average delay D, instead of just being a lower bound for it. 
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B Proof of Lemma 4 

The average throughput of the worst user scheme is given by 

poo t N \ ( N \ 

R^^^ log(l + xP)Ne-^''dx = -Neyp)Ei (^-— j . (33) 

For large values of we have 



-X g* 



Ei{-x) = / -dt = (1 + e) , 

J —CO t X 

where e — > as x — > oo. Using this in (33), we get 

Rtot^P{l + e)=Q{l). 
Letting a = 1 in (5) of Theorem 3, we get the average delay as^ 

D = max |f] , ^ . (34) 

Since the base station maintains only a single queue for the worst user scheme, we have 

1 \ 



E[Xmin\ — X — Q 
The average service rate E[Ri\ is given by 



E[Ri] = log(l + xP)Ne-^''dx = -ei^^Ei (^-^) = ^ 



Since E[X^in] ~ X, the expression on the right in (34) gives the exact scaling of D. Thus the 
average delay of the worst user scheme scales as D — Q{N). 

C Proof of Lemma 5 

By letting a = in (4) of Theorem 3, the average throughput of the best user scheme is found to 
be 



Rtot-Ei^'iy-iyei^^Ez^-^). (35) 

It has been shown in [19] that the throughput in (35) scales as 

i?tot = e(loglog7V) (36) 

with the number of users N. Hence the average service rate is given by 

E[RN] = Rtot = Q{loglogN). 
^Note that Q. ((;v%) iog°iogJv ) ^ "^^^^ a = 1, since for any constant fc, A; + loga = e(loga) VI < a < A''. 
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Letting a = in (5) of Theorem 3, we get the average delay as 



D = max L (i^^^) , ^ {NE[X^,^])\ . (37) 



Now 

r N 

E[Xmin\ = E 



min Xi 

1=1 * 



< X = e 



E[R 



Hence 

Thus, from (37), the average delay of the best user scheme scales as 

■ AT log iV \ 



log log N 



D Proof of Lemma 6 



The average throughput of the median user scheme can be derived by letting a = 2 in (4) of 
Theorem 3. From the results on central order statistics in [20] (Theorem 8.5.1), we know that 
the sample median of i.i.d. exponential random variables converges in distribution to a normal 
random variable with mean 9 and variance (1/A^), where 9 — log 2 is the median of the underlying 
exponential distribution. Hence 

(|^7r(f +1)1^ - ^) in distribution, 

where is a standard normal random variable. Using Chebyshev's inequality, we get Ve > 

+1)1' -0\>e)=Pr (x/A^ 1 1/^.(^+1)1' - ^1 > e^) < ^^^^^ - as AT ^ oo. 

=^ |/i,r(^+i)r ~^ ^ in probability. 
Since the log(.) function is continuous, we have 

log(l + \K(f+i)\'^P) log(l + 9P) in probability. (38) 

We now derive a lower bound on the average throughput. We recall the following property of 
positive random variables. Let (X„) be a set of positive random variables converging to a constant 
A in probabihty. Hence Ve > 0, 

Pr {\Xn - A\ > e) < 5, 

for some small 6 > 0. Now 

E[X^] = / tfx„{t)dt > / tfxMdt >{A- e)(l - 6). 

Jo JA-e 

Taking the limit as n — > oo, we get 

lim E[Xr,] > A. (39) 
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Using this property in (38), we get 

log(l + l^.(f+i)l'^) 
'N 



lim E 



>iog(i + ep) = e(i). 



Rtot = ( y 1 ^ 



\og[l + \K^.^,^\'P)\=niN). 
An upper bound on the average throughput of any scheduhng scheme is given by 

TV 



(40) 



Rtot < E 



^log(l + |/ii|2p) = iVE [log(l + |/ii|^P 
.1=1 J 



Hence 

Rtot^O(N). (41) 
Combining this with the lower bound in (40), we get 

Rtot^Q{N). 

Thus it is clear that the throughput of the proposed median user scheme is scaling law optimal. 
Letting o; = 2 in (5) of Theorem 3, we get the average delay as 



D — max < Q 



N 



,A^/2;iogiogivy ' 

Now, since = min{Xi,X2}, we have 

Eix„,.] = e (A-) = e (^) . 

The average service rate E[R2\ is given by 

E[R,] = E [log (l + \K^N^,^\'P)] = 0(1). 



(42) 



(43) 



Since E[Xmin] = © ("^j' expression on the right in (42) gives the exact scaling of D. Thus the 
average delay of the median user scheme scales as 



' N ^ 



Using Stirling's formula, we obtain the scahng of the average delay as given in (11). 

E Proof of Theorem 7 

Let Ai denote the event that a packet is successfully decoded by all the N users in the system in i 
transmission attempts. Following the notation in [18], we define 



q{m) = Pr(^i, . . . , Am-i, Am) = p(m - 1) - p(m), 
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where 

m 

p{m) = Pr(A,...,^^_i, A^) = 

1=1 

with ]9(0) = 1. The rate R is defined as R = (b/L). We define the random variable r to be the 
number of transmission attempts made between the instant when the codeword is generated and 
the instant when its transmission is stopped (Transmission is stopped either when the packet is 
successfully decoded by all the users or the number of transmission attempts exceeds the rate 
constraint M). The probability distribution of r is given by 

r 0, m = 
/^-(m) = < g(m), 1 < m < M — 1 
[ q{M)+p{M), m = M 

We define the random reward TZ as follows: TZ = NR if transmission stops because of successful 
decoding and 7^ = if transmission stops because of the rate constraint violation. Hence 

M 

E[n\ =NRJ2 = NR[l-p{M)]. 

m=l 

The mean inter-renewal time is given by 

MM M M-1 

-^M — X! ^frii^) = 51 + Mp{M) — ^ m\p{m — 1) — p{m)] + Mp{M) — ^ p{m). 

m=l m=l m=l m=0 

Applying the renewal-reward theorem, we obtain the average throughput of the proposed scheme 
as 

Rtot = '^^^ "^^^^ probability 1. 

Hence 

_ NR[l-p{M)] 

The average delay D of the scheme is given by the mean inter-renewal time. Hence 

M-l 
m=l 

The unconstrained throughput and delay of this scheme are obtained by letting M ^ oo and are 
given by 



and 



D = J2p{m). (45) 

m=0 



Prom the earlier definitions, we have 

Pirn) = FriA, A^i, An) = Pr( A^) = Pr ( mm ^ /(X; Y,k) < R\ . 



N 



k=l 
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p(m) = 1 - ¥Ami^Y,I{X;Y,k) > i?U 1 - 



k=l 



1-Pr Q^7(X;yife) <i? 

\k=i J 



N 



Now for a Gaussian input distribution, we have 



We know that 



Hence 



5:/(X;Fi,) = ^log(l + |/i,|2). 

k=l k=l 



log 1 + ^ < ^ iog(i + \hk\') < E i/i/^r- 

\ k=l I k=l k=l 



(46) 



Prij2\hk\'<{e^-1)] >Prij2log{l + \hk\')<R] > Pr ^ l^^f < ^ • 



\k=l 



yk=l 



\k=l 



Since both R and (e^ — 1) are constants, substituting both the lower and upper bounds in (46) will 
yield the same scaling with A^. So we consider only the lower bound on p{m). Let 



s{m) — 1 



i-vAY,\hk? <R 
\k=i } 



Hence 



p(m) = I X/ •s(^) J w.r.t N . 



m=0 



w=o 



The random variable I]fc=i l^fcP has a 2m-dimensional Chi-square distribution with the density and 
distribution functions given by 



^ *^ ^^^^^ 



(m- 1)! 



, X > 



and 



Hence 



F(x) = 1 - e-^ (^E , x>0. 



sivPb) = 1 



i^m— 1 r>l 



E 

. /=0 



-, AT 



Prom Taylor's theorem, we know that (for some < ^ < 1) 



1=0 



U ml 



s{m) = 1 - 1 - 



N 



ml 
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To find the scaling of J2m=o sirn) w.r.t N, we first derive a lower bound by finding the value of m 
until which s{m) — 1 as — > cxo. Now 

s(m) ^1 ^1 ■ ^ 0. 

\ ml J 

-(l-e)Rr>m /I 



Using Stirling's approximation, we have 

^-(l-e)R^m ^ 

> — , V constant k. 



\/27rme-"*m"^ N 
Taking log on both sides, we get 

(Tfl \ 1 
j + - log(27rm) < log - log VA;. 

For large N, this equation can be reduced to 

m log m < log N. (47) 
This equation is satisfied by all values of m such that 

/ logiV \ 
m<id\ - — — . 

Viogiogiv; 

Since s{m) — >• 1 as — >• oo for all values of m that satisfy the above equation, the sum of s(m)'s 
can be lower bounded as 

f ,(,„)> e(-^). ,48) 

^0 \\og\ogN ) 

Similarly an upper bound on Y^m=o s{m) can be derived by finding the value of m from which 
s{m) as N ^ oo. Following the same procedure as before, we find that s{m) — > when 

logiV \ 



m > © 

This yields the following upper bound 



J2 s(m) < e 



log log N J 

logiV 



m=0 



log log N 



Combining this with the lower bound in (48), we get 

logiV 



^ s(m) = 

m=0 



Jog log A?" 



Thus the average delay is given by 

logA^ 



oo 



m=0 \m=0 / 



log log A^ j 

The average throughput of the incremental redundancy scheme is then given by 

_ NR _ ^ _ Q /^AHoglogiVX 
~ TZ=oP{m) \ logAT ) ■ 
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F Proof of Theorem 8 



The first stage of the cooperation scheme is the median user scheme. Hence it is clear from (10) 
that 



E[R,^] = E [log (l + +i)rP)] = 6(1). 



As noted earlier, the cooperative transmission by the users in the second stage is equivalent to the 
transmission of packets from a transmitter equipped with {N/2) transmit antennas to the worst 
user in a group of {N/2) users. Hence the average transmission rate during the cooperative stage 
is given by 



E[Rs2] = E 



mm log IH ■ /^w^x -P > 

=i,...,(JV/2) ^ \^ {N/2) J\ 



where the \hki\ s are i.i.d and exponentially distributed and represent the inter-user fading coefficients. 



E[Rs2] = E 



log 1 + min 



i=l,...,M M 



(49) 



where M = {N/2) and IxWP's are Chi-square random variables with 2M degrees of freedom whose 
distribution function is given by 

f^-^ tA 
F{x) = 1 - E T ' ^ 0- 
\j=o 3- J 

Using the results on extreme order statistics in [20] (Theorems 8.3.2-8.3.6), it can be shown that 
the random variable 



bM 



W in distribution as M — > oo. 



where is a WeibuU type random variable and Bm satisfies F{hM) — jf- Now 



F{bM) = 



M 



1-e 



-6m 



rj 

j=0 J- 



1 

M' 



Using Taylor's theorem, we get for some < (3m < 1 



e — 



p-{l-PM)bMhM 1 
C Uj^ J. 



M! 



M 



M\ 



M' 



Using Stirling's approximation, we have 



p-{l-l3M)bMhM 



V27fMM^e-^ M' 

Taking log(.) on both sides, we get 

(1 - - Mlog^M = M - (m - logM + C. 
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Since /9m ^ as M — >• oo, we get 6m = 0(M). Thus 

kW in distribution, for some constant k > 0. 



• Ml? 12 

mmi=i \xIm\ 



M 

Since the log(.) function is continuous, we have 



log ( 1 + ^i±l^A^^p J ^ iog(i + kWP) in distribution, as M ^ oo. 



Now, we know 



^ I M - M - M ' 



Since 



E 



(\X2mI 



P 



\ M 



El(\xl,,mP^ _ (M^ + An p. = f 1 + 1 ) < < CO VM, 



M 



- ; M > 1 > is uniformly integrable. 



=^ I log + ^^^i=^J^^M\ ; M > l| is uniformly integrable. 

It is shown in [21] that if a sequence of random variables is uniformly integrable and X„ — > X 
in distribution as n — > oo, then EXn EX as n — > oo. Thus 



E 



£;[log(l + A;iyP)] = 0(1). 



Hence the average transmission rate of the second stage is given by E[Rs2] = ©(1) w.r.t N. Since 
both E[Rsi\ and E[Rs2] do not scale with N and since the minimum is taken over only two positive 
quantities, we have 

E [min{R,^, R,^}] = 6 {E[R,^]) = 6(1). 
Thus the average throughput of the cooperation scheme is given by 

'N 



Rtot = (y) E [min{P,i, ^,2} ] = e{N). 



We now determine the average delay of the cooperation scheme. We note that the base station 
needs to maintain only a single queue that caters to all the N users in the system. The information 
transmitted by the base station in the first half of each time slot reaches all the N users at the 
end of that time slot. Hence the average delay is equal to the average service time required for 
transmitting a packet of size S from the queue. Following the steps in Appendix A, the average 
delay D for transmitting a packet in the cooperation scheme is given by 

D^Tc + fMS = e I—, r| — -—r] = 9(1). 

^ \E[imn{Rsi,Rs2}]J ^ ' 
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G Proof of Theorem 9 



We first extend the proof of the general static scheduhng scheme given in Appendix A to the multi- 
group scenario and then consider the three special cases. The average throughput Rfot of the general 
multi-group scheduling scheme is given by 

Rtot = ( ) E[Ra], 



a 

where Ra represents the transmission rate to each of the intended {N/ a) users and is given by 

where the distribution of is given by 

F,,ix)=( E f^)(l-e--)'e-(^-^)^) ,yx>0. (50) 

Hence the average throughput is given by 

Rtot = - / log(l + xP)dFi,g{x). (51) 

a Jo 

Integrating by parts and simplifying, we obtain the average throughput of the proposed scheme. 

For implementing the general multi-group static scheduling scheme, the base station needs to 
maintain queues, one for each combination of {N/a) users in each of the G groups. These 

queues can be divided into G sets, one for each of the groups. Within each set corresponding to 
a particular group, the queues can be further divided into subsets with a coupled queues in each 
subset such that the combinations of users served by the a queues within a subset are mutually 
exclusive and collectively exhaustive (i.e., every user in the particular group is served by exactly 
one of the a queues). We consider one such subset of a queues corresponding to any one of the 
G groups. Any packet that arrives into the subset enters all the a queues since it needs to be 
transmitted to all the users within the group. At any instant of time, the base station services only 
one of the queues. 

As before, we first calculate the average service time X required for transmitting a packet by 
assuming that the base station always services the same queue. Following the steps in Appendix A, 
the average service time X is given by 

X ^Tc + plS ^ 



E[Ra] 



We again use the results in [14] to derive a lower bound on the actual delay by considering the 
minimum number of trials Nt required to ensure that all the a queues are served at least {Xmin/Tc) 
times by the base station. As before, we consider only (a + 1) queues with the {a + 1)*'* queue 
being the "dummy" queue representing all the queues in all other subsets in the system. Now the 
probabilities {pi} of the server choosing the i*'* queue are given by 
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Proceeding as in Appendix A, the MGF of the number of trials required is given by 



f 

Jo 



{t)e- 



dt. 



The average number of trials required E[Nt\Xmin] is given by 

■ N ^ 



N/a J Jo 



1 — S^x^^{t)e~ 



dt = G{ \e 
\N/a) 



max Yj 

l<i<a 



where the 1^'s are i.i.d random variables that follow a Chi-square distribution with {2Xmin/Tc) 
degrees of freedom. Using the result in (31), we get 

^[iV.|X„.„l ^n.ax{e {g(^] log„) .9 (g(^M^)} . 



Thus the average delay of the proposed scheme can be lower bounded as 
D > £x„,. lE{N,\X^tn]Tc] = Ex, 



\ \\N/aJ\og\ogNGj ' \ \N/ 



(52) 



Moreover, when E[Xjnin] = © (^)' expression on the right in (52) gives the exact scaling of the 
average delay D, instead of just being a lower bound for it. 

G.l Best among worst users scheme (a = 1) 

Letting a = 1 in (51) and simplifying, we get the average throughput to be 



Riot = N 



Eg)(-l)^e(^)^^ 



(53) 



For large values of x, we have 

Ei{-x) = 

where e ^ as x — > oo. Using this in (53), we get 



-di = (1 + e), 

oo I X 



Rtot — P 



£ (G\ (-1)^^+1 



.k=\ 



k k 



:i + e). 



It can be shown using the results in [22] that 



Rtnt — P 



■G ^ 

^k 

.k=l '^J 



(l + e) = e(logG). 
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The average service rate is given by 



Letting a = 1 in (52) and using the fact that 



'logG^ 
N 



E[Xmin\ — X — Q 



E[Ri] 



we get the average delay as 



G.2 Best among best users scheme (a = N) 

Letting a = A'" in (51) and simphfying, we get the average throughput to be 



l)''e^^)Eil-^] . (54) 



It has been shown in [19] that the throughput in (54) scales as 

Rtot = QiloglogNG). (55) 
Hence the average service rate is given by 

E[RM]=Rtot = e{log\ogNG). 
Letting a = in (52) and using the fact that 

1 \ 



E[Xmin\ < X = Q 

we get the average delay as 



D = n 



E[Rn] 

' NG\ogN \ 
loglogNGj- 



G.3 Best among median users scheme (a = 2) 

The average throughput of the best among median users scheme can be derived by letting a = 2 in 
(51). It is given by 

'^^-[log(l + m|x!A»,,„^yPp)^ 

We now determine bounds on the asymptotic scaling of Rtot as N and G grow to infinity. To get a 
lower bound on the throughput, we use the fact that max{Xi, • • • , X„} > Xi and obtain 



Rtot ^ I y J ^ 



Rtot > ( y 1 ^ 



log (i + !%+!) 1'^ 
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The expression on the right is clearly the throughput of the median user scheduler described in 
Section 3.1. Hence, from Lemma 6, we get 

We derive an upper bound on the throughput by using the fact that for continuous unimodal 
distributions 

I Median — Mean] < Standard Deviation. 

Hence^ 



i^7r(f +i)r < 



+ h \h 



N\ 



N 



+ 



\ 



\hi\^ + --- + \h 



N\ 



\h\^ + ---+\h 



N\ 



E 



max \n' 



ri'vif+i)! 



< E 



G 

max 

9=1 



N 



\h{\^ + --- + \hl 



N 



N 



+ 



N 



Using Jensen's inequality and the fact that m.ax^^i{Xi + 1^} < max"^i{Xj} + maa;"=i{Fi}, we get 
E 



max |/iLiv, iJ 

g=\ ' 7r(3- + l)l 



< E 



G 

max ■ 

9=1 



N\ 



N 



+ 



E 



G_\\h{\'' + --- + \h% 



max 

9=1 



N 



1 ^ 
< -YE 



G 

max 

. 9=1 



+ 



N 



max 

. 9=1 



E 



max I /i? I 

.9=1 



+ 



\ 



E 



{ rnkx. I /i? p 

V9=l 



It is known that for a sequence of exponential random variables {X^} with unit mean [20], 



E 



max A, 
i=i * 



eQogG) and E 



Thus 



E 



max I^Lat , -iJ 

3=1 n{ — + l) 



( mix Xj 
\ i=i 



= 0(logG). 



e (logG)^ 



Now applying Jensen's inequality, we get the upper bound on the average throughput as 



■tot 



E 



< 



log 1 + E 



max\hl,N \ 

g=l 7r(-j+l)l 



(56) 



^Rtot^O{N\og\ogG). 
Thus the average throughput of the best among median users scheme can be bounded as 

n{N) =Rtot = 0{N loglogG). 

The average service rate £'[-^2] is then bounded by 

Q{1)=E[R2] =0(loglogG'). 

^It is easy to show using convergence arguments that the inequaUty is vaUd for the empirical values used in the 
proof. 
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Letting a = 2 in (52) and using the fact that 

^[x„.„i = e(A-)=e(^), 

the average delay can be bounded as 

Using Stirling's formula, we obtain bounds on the average delay as stated in (21). 



H Proof of Theorem 10 

The average throughput of the multi-group cooperation scheme is given by 

Rtot = E max | (^y^ min{i?fi, R^^} 
Since E[ma,x{Xi, ■ ■ ■ , Xn}] > E[Xi\, we have 

i?tot>i^ [(y) mm{Rl,,Rl,} 

The expression on the right is the average throughput of the single group cooperation scheme 
described in Section 3.3. Using the results of Theorem 8, we have 

Rtot = n{N). 

The average throughput can be upper bounded by using the fact that 



E 



G 

max 

. 9=1 



|)min{i?fi,i?f2} 





< E 


' G NRi,' 
max 




log ( 


}] 








9=1 2 







G 



gh + max|/i^^„ ^^1 P 



The expression on the right is the average throughput of the best among median users scheme 
proposed earlier. Using the results of Theorem 9, we get 

i?to* = 0(iVloglogG). 

^ n{N) = Rtot = 0{N loglogG). 

We now determine the average delay of the multi-group cooperation scheme. To implement this 
scheme, the base station needs to maintain G queues, one for each group. At the beginning of 
each time slot, the base station selects a group according to condition (22). Since we consider a 
symmetric scenario, the probability that the base station chooses any particular group is (l/G). 
The information transmitted by the base station in the first half of each time slot reaches all the N 
users in the selected group at the end of that time slot. Hence the average delay for transmitting a 
packet in the multi-group cooperation scheme is given by 

where Rtot = {NE[R])/2. Hence the average delay of the multi-group cooperation scheme can be 
bounded as 
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I Proof of Lemma 11 



Prom the results on extreme order statistics in [20], we know that 

|2 



\Xr. 



W in distribution, 



where W has a WeibuU type distribution and 6jv satisfies F(6jv) = j^, which imphes 

^^-1 {LbN)'\ 1 



1-e 



-Lb. 



E 

vfc=0 



A;! / N' 



Using Taylor's theorem, we get for some < < 1 



1 - e~^'"' e^'"' 



1 e-(^-T^)-^''^(L6jv)^ _ 1 



L\ ) N L\ 

Taking log(.) on both sides, we get 

(1 — '^N)LhN — I/log6jv = \ogN + LlogL — log(L!). 

Since |XmmP < IXiP ~ ^(1); know that = 0(1) and hence the logfe^v term dominates the 
left hand side of the above expression. Thus we have 



6^ = e (iv-i) . 



kW in distribution, for some constant A; > 0. 



Since E [\Xmin\'^] < E [IxiP] < oo, we can use the result in Theorem 2.1 of [23] to conclude that 



\Xr 



E 



iXvi 



kE[W] = 9(1). 
= (A^-i) . 



The average throughput of the worst user scheme can now be upper bounded using Jensen's in- 
equality as follows 



Rfnt = NE 



\Xmir: 



P). 



log(l + |XmmrP)] <iVlog(l + £; 

^ R,,, = o (^Ni'^y 

We lower bound the average throughput of the worst user scheme as follows 

POO roo 
Rtot = N / log(l + xP)dFrmn{x) > N / log(l + xP)dF^,„{x). 
Jo JbN 

Rtot > Nlog (1 + br,P) [1 - F^i„(6iv)] , 



(57) 



where 



Fminix) = 1 - (1 - F(x)) 
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N 



Using the fact that F(6jv) = j^, we get 

(1 s iV 

=^Rtot > iVlog(l + 6^P) 



AT' 

N 

N, 



= e (iviog(i + iV""^p 



Combining this with the upper bound in (57), we get 

Rtot = e (n(^) 



J Proof of Lemma 12 

Prom the results on extreme order statistics in [20] , we know that 

W in distribution, 



I Xmax I CtJV 



where W has a Gumbel distribution and ojv and satisfy 

1 1 

FittN) = 1 — — and 



where /(.) denotes the probabihty density function obtained from (25). Now 

FM = '-M ^ (L-1). (^ + "(J)-iV ■ 
Taking log(.) on both sides and simphfying, we get 

LuN -{L-1) logoiv = log AT + (L - 1) - ^ log(L -1) + K. 

logiV+(L-l)loglogiV ^ 
=^ Oat = h O(loglogA^). 

Since 

Le-^"^(La^)(^-^) /1\ 
= (L^TT)! = ® UJ ' 

we have hj^ — C — 6(1). Thus 

IXmax I - ( L ° ^ '^^'^^ ) ^ distribution. 
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Using Chebyshev's inequality, it is easy to show that 

|2 



^ logAf+(L-l)loglogAf j 



1 in probabihty. 



Since any Chi-squared random variable with 2L degrees of freedom can be expressed as the sum of 
L exponential i.i.d random variables, we have 



, ,2 N {Z{ + --- + Zl \ ^ N 

\Xmax\ =max<^ ^<maxZi, 



where Z''s are exponential random variables with unit mean. Hence 



E 



\Xr. 



logAf+(L-l)loglogAf ^ 



E 



r-i 



logAf+(L-l)loglogAr 



k\ogN 



',N+(L-l)\ 



'N\ — 



< kL < oo. 



Thus we can apply the Dominated Convergence Theorem to get 



E 



\Xr. 



|^ logAr+(L-l)loglogAr 'j 



E 



\Xri 



e 



'logA^+ (L - 1) log log A^' 



Using Jensen's inequality, we get 



tot 



E 



log (l + \Xr 



< 



log (l + E 



\Xr. 



P). 



We can lower bound the average throughput of the best user scheme as follows 

Rtot^ log{l + xP)dF^ax{x)> log{l + xP)dF^ax{x). 

Jo J 

^ Rtot > log (1 + ttNP) [1 - Fmax{aN)] , 



where 

Using the fact that F(aAr) = l — we get 
Rtot > log(l + ajvP 



R 



tot 



O log 1 + 



= e(log(l + aArP)). 
log AT + (L - l)loglogA^\\ 



Combining this with the upper bound in (58), we get 

i^..-efiogfi + ^"g^ + ^^-^)^"g^"g^^ 



(58) 
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Figure 1: A queuing model for a system with N — 6 users and a — 3 



35 



3.5 




Q 5 1 I I I I I I I \ I 

1 2 3 4 5 6 7 8 9 10 

Position of scheduled user in tiie ordered SNR list 



Figure 2: Throughput of the general static scheduhng scheme for different positions of the intended 
user in the ordered hst of SNRs of all users (N=10) 
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Figure 3: Comparison of the throughput of the proposed schemes 
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Figure 4: Comparison of the delay of the proposed schemes 
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Figure 5: Comparison of the throughput of the proposed schemes for G = 5 groups 
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Figure 6: Comparison of the delay of the proposed schemes for G = 5 groups 
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